14 research outputs found

    Predicting asthma attacks in primary care: protocol for developing a machine learning-based prediction model

    Get PDF
    INTRODUCTION: Asthma is a long-term condition with rapid onset worsening of symptoms ('attacks') which can be unpredictable and may prove fatal. Models predicting asthma attacks require high sensitivity to minimise mortality risk, and high specificity to avoid unnecessary prescribing of preventative medications that carry an associated risk of adverse events. We aim to create a risk score to predict asthma attacks in primary care using a statistical learning approach trained on routinely collected electronic health record data. // METHODS AND ANALYSIS: We will employ machine-learning classifiers (naïve Bayes, support vector machines, and random forests) to create an asthma attack risk prediction model, using the Asthma Learning Health System (ALHS) study patient registry comprising 500 000 individuals across 75 Scottish general practices, with linked longitudinal primary care prescribing records, primary care Read codes, accident and emergency records, hospital admissions and deaths. Models will be compared on a partition of the dataset reserved for validation, and the final model will be tested in both an unseen partition of the derivation dataset and an external dataset from the Seasonal Influenza Vaccination Effectiveness II (SIVE II) study. // ETHICS AND DISSEMINATION: Permissions for the ALHS project were obtained from the South East Scotland Research Ethics Committee 02 [16/SS/0130] and the Public Benefit and Privacy Panel for Health and Social Care (1516-0489). Permissions for the SIVE II project were obtained from the Privacy Advisory Committee (National Services NHS Scotland) [68/14] and the National Research Ethics Committee West Midlands-Edgbaston [15/WM/0035]. The subsequent research paper will be submitted for publication to a peer-reviewed journal and code scripts used for all components of the data cleaning, compiling, and analysis will be made available in the open source GitHub website (https://github.com/hollytibble)

    A Data-Driven Typology of Asthma Medication Adherence using Cluster Analysis

    Get PDF
    Asthma preventer medication non-adherence is strongly associated with poor asthma control. One-dimensional measures of adherence may ignore clinically important patterns of medication-taking behavior. We sought to construct a data-driven multi-dimensional typology of medication non-adherence in children with asthma. We analyzed data from an intervention study of electronic inhaler monitoring devices, comprising 211 patients yielding 35,161 person-days of data. Five adherence measures were extracted: the percentage of doses taken, the percentage of days on which zero doses were taken, the percentage of days on which both doses were taken, the number of treatment intermissions per 100 study days, and the duration of treatment intermissions per 100 study days. We applied principal component analysis on the measures and subsequently applied k-means to determine cluster membership. Decision trees identified the measure that could predict cluster assignment with the highest accuracy, increasing interpretability and increasing clinical utility. We demonstrate the use of adherence measures towards a three-group categorization of medication non-adherence, which succinctly describes the diversity of patient medication taking patterns in asthma. The percentage of prescribed doses taken during the study contributed to the prediction of cluster assignment most accurately (84% in out-of-sample data)

    Geolocation with respect to persona privacy for the Allergy Diary app - a MASK study

    Get PDF
    Background: Collecting data on the localization of users is a key issue for the MASK (Mobile Airways Sentinel network: the Allergy Diary) App. Data anonymization is a method of sanitization for privacy. The European Commission's Article 29 Working Party stated that geolocation information is personal data. To assess geolocation using the MASK method and to compare two anonymization methods in the MASK database to find an optimal privacy method. Methods: Geolocation was studied for all people who used the Allergy Diary App from December 2015 to November 2017 and who reported medical outcomes. Two different anonymization methods have been evaluated: Noise addition (randomization) and k-anonymity (generalization). Results: Ninety-three thousand one hundred and sixteen days of VAS were collected from 8535 users and 54,500 (58. 5%) were geolocalized, corresponding to 5428 users. Noise addition was found to be less accurate than k-anonymity using MASK data to protect the users' life privacy. Discussion: k-anonymity is an acceptable method for the anonymization of MASK data and results can be used for other databases.Peer reviewe

    Geolocation with respect to persona privacy for the Allergy Diary app - a MASK study

    Get PDF
    Background: Collecting data on the localization of users is a key issue for the MASK (Mobile Airways Sentinel network: the Allergy Diary) App. Data anonymization is a method of sanitization for privacy. The European Commission's Article 29 Working Party stated that geolocation information is personal data.To assess geolocation using the MASK method and to compare two anonymization methods in the MASK database to find an optimal privacy method.Methods: Geolocation was studied for all people who used the Allergy Diary App from December 2015 to November 2017 and who reported medical outcomes. Two different anonymization methods have been evaluated: Noise addition (randomization) and k-anonymity (generalization).Results: Ninety-three thousand one hundred and sixteen days of VAS were collected from 8535 users and 54,500 (58. 5%) were geolocalized, corresponding to 5428 users. Noise addition was found to be less accurate than k-anonymity using MASK data to protect the users' life privacy.Discussion: k-anonymity is an acceptable method for the anonymization of MASK data and results can be used for other databases

    Linkage of primary care prescribing records and pharmacy dispensing Records in the Salford Lung Study: application in asthma

    No full text
    BACKGROUND: Records of medication prescriptions can be used in conjunction with pharmacy dispensing records to investigate the incidence of adherence, which is defined as observing the treatment plans agreed between a patient and their clinician. Using prescribing records alone fails to identify primary non-adherence; medications not being collected from the dispensary. Using dispensing records alone means that cases of conditions that resolve and/or treatments that are discontinued will be unaccounted for. While using a linked prescribing and dispensing dataset to measure medication non-adherence is optimal, this linkage is not routinely conducted. Furthermore, without a unique common event identifier, linkage between these two datasets is not straightforward. METHODS: We undertook a secondary analysis of the Salford Lung Study dataset. A novel probabilistic record linkage methodology was developed matching asthma medication pharmacy dispensing records and primary care prescribing records, using semantic (meaning) and syntactic (structure) harmonization, domain knowledge integration, and natural language feature extraction. Cox survival analysis was conducted to assess factors associated with the time to medication dispensing after the prescription was written. Finally, we used a simplified record linkage algorithm in which only identical records were matched, for a naïve benchmarking to compare against the results of our proposed methodology. RESULTS: We matched 83% of pharmacy dispensing records to primary care prescribing records. Missing data were prevalent in the dispensing records which were not matched - approximately 60% for both medication strength and quantity. A naïve benchmarking approach, requiring perfect matching, identified one-quarter as many matching prescribing records as our methodology. Factors associated with delay (or failure) to collect the prescribed medication from a pharmacy included season, quantity of medication prescribed, previous dispensing history and class of medication. Our findings indicate that over 30% of prescriptions issued were not collected from a dispensary (primary non-adherence). CONCLUSIONS: We have developed a probabilistic record linkage methodology matching a large percentage of pharmacy dispensing records with primary care prescribing records for asthma medications. This will allow researchers to link datasets in order to extract information about asthma medication non-adherence

    A retrospective cohort study predicting and validating impact of the COVID-19 pandemic in individuals with chronic kidney disease

    Get PDF
    Chronic kidney disease (CKD) is associated with increased risk of baseline mortality and severe COVID-19, but analyses across CKD stages, and comorbidities are lacking. In prevalent and incident CKD, we investigated comorbidities, baseline risk, COVID-19 incidence, and predicted versus observed 1-year excess death. In national English data(NHSD-TRE; n=56 million), we conducted a retrospective cohort study in prevalent and incident CKD(March 2020 to March 2021) of prevalence of comorbidities by incident and prevalent CKD, SARS-CoV-2 infection and mortality. We assessed baseline mortality risk, incidence and outcome of infection by comorbidities, controlling for age, sex and vaccination. We compared observed versus predicted 1-year mortality at varying population infection rates(IR) and pandemic-related relative risks(RR) using our published model in pre-pandemic CKD cohorts(NHSD TRE and CPRD). Among individuals with CKD(prevalent:1,934,585, incident:144,969), comorbidities were common(73.5% and 71.2% with ≥1 condition, and 13.2% and 11.2% with ≥3 conditions, in prevalent and incident CKD), and associated with SARS-CoV-2 infection, particularly dialysis/transplantation(OR 2.08, 95% CI 2.04-2.13) and heart failure(OR 1.73, 1.71-1.76), but not cancer(OR 1.01, 1.01-1.04). One-year all-cause mortality varied by age, sex, multimorbidity and CKD stage. Compared with 34,265 observed excess deaths, in NHSD-TRE and CPRD data respectively, we predicted 28,746(83.9%) and 24,546(71.6%) deaths (IR 10% and RR=3.0), and 23754(69.3%) and 20283(59.2%) deaths (observed IR 6.7% and RR 3.7). In the largest, national-level study to-date, individuals with CKD have high burden of comorbidities and multimorbidity, high risk of pre-pandemic mortality and a high risk of pandemic mortality. Treatment of comorbidities, non-pharmaceutical measures, and vaccination are priorities in people with CKD and management of long-term conditions is important during and beyond the pandemic

    COVID-19 trajectories among 57 million adults in England: a cohort study using electronic health records

    No full text
    Background Updatable estimates of COVID-19 onset, progression, and trajectories underpin pandemic mitigation efforts. To identify and characterise disease trajectories, we aimed to define and validate ten COVID-19 phenotypes from nationwide linked electronic health records (EHR) using an extensible framework. Methods In this cohort study, we used eight linked National Health Service (NHS) datasets for people in England alive on Jan 23, 2020. Data on COVID-19 testing, vaccination, primary and secondary care records, and death registrations were collected until Nov 30, 2021. We defined ten COVID-19 phenotypes reflecting clinically relevant stages of disease severity and encompassing five categories: positive SARS-CoV-2 test, primary care diagnosis, hospital admission, ventilation modality (four phenotypes), and death (three phenotypes). We constructed patient trajectories illustrating transition frequency and duration between phenotypes. Analyses were stratified by pandemic waves and vaccination status. Findings Among 57 032 174 individuals included in the cohort, 13 990 423 COVID-19 events were identified in 7 244 925 individuals, equating to an infection rate of 12·7% during the study period. Of 7 244 925 individuals, 460 737 (6·4%) were admitted to hospital and 158 020 (2·2%) died. Of 460 737 individuals who were admitted to hospital, 48 847 (10·6%) were admitted to the intensive care unit (ICU), 69 090 (15·0%) received non-invasive ventilation, and 25 928 (5·6%) received invasive ventilation. Among 384 135 patients who were admitted to hospital but did not require ventilation, mortality was higher in wave 1 (23 485 [30·4%] of 77 202 patients) than wave 2 (44 220 [23·1%] of 191 528 patients), but remained unchanged for patients admitted to the ICU. Mortality was highest among patients who received ventilatory support outside of the ICU in wave 1 (2569 [50·7%] of 5063 patients). 15 486 (9·8%) of 158 020 COVID-19-related deaths occurred within 28 days of the first COVID-19 event without a COVID-19 diagnoses on the death certificate. 10 884 (6·9%) of 158 020 deaths were identified exclusively from mortality data with no previous COVID-19 phenotype recorded. We observed longer patient trajectories in wave 2 than wave 1. Interpretation Our analyses illustrate the wide spectrum of disease trajectories as shown by differences in incidence, survival, and clinical pathways. We have provided a modular analytical framework that can be used to monitor the impact of the pandemic and generate evidence of clinical and policy relevance using multiple EHR sources. Funding British Heart Foundation Data Science Centre, led by Health Data Research UK

    COVID-19 trajectories among 57 million adults in England: a cohort study using electronic health records.

    Get PDF
    BACKGROUND: Updatable estimates of COVID-19 onset, progression, and trajectories underpin pandemic mitigation efforts. To identify and characterise disease trajectories, we aimed to define and validate ten COVID-19 phenotypes from nationwide linked electronic health records (EHR) using an extensible framework. METHODS: In this cohort study, we used eight linked National Health Service (NHS) datasets for people in England alive on Jan 23, 2020. Data on COVID-19 testing, vaccination, primary and secondary care records, and death registrations were collected until Nov 30, 2021. We defined ten COVID-19 phenotypes reflecting clinically relevant stages of disease severity and encompassing five categories: positive SARS-CoV-2 test, primary care diagnosis, hospital admission, ventilation modality (four phenotypes), and death (three phenotypes). We constructed patient trajectories illustrating transition frequency and duration between phenotypes. Analyses were stratified by pandemic waves and vaccination status. FINDINGS: Among 57 032 174 individuals included in the cohort, 13 990 423 COVID-19 events were identified in 7 244 925 individuals, equating to an infection rate of 12·7% during the study period. Of 7 244 925 individuals, 460 737 (6·4%) were admitted to hospital and 158 020 (2·2%) died. Of 460 737 individuals who were admitted to hospital, 48 847 (10·6%) were admitted to the intensive care unit (ICU), 69 090 (15·0%) received non-invasive ventilation, and 25 928 (5·6%) received invasive ventilation. Among 384 135 patients who were admitted to hospital but did not require ventilation, mortality was higher in wave 1 (23 485 [30·4%] of 77 202 patients) than wave 2 (44 220 [23·1%] of 191 528 patients), but remained unchanged for patients admitted to the ICU. Mortality was highest among patients who received ventilatory support outside of the ICU in wave 1 (2569 [50·7%] of 5063 patients). 15 486 (9·8%) of 158 020 COVID-19-related deaths occurred within 28 days of the first COVID-19 event without a COVID-19 diagnoses on the death certificate. 10 884 (6·9%) of 158 020 deaths were identified exclusively from mortality data with no previous COVID-19 phenotype recorded. We observed longer patient trajectories in wave 2 than wave 1. INTERPRETATION: Our analyses illustrate the wide spectrum of disease trajectories as shown by differences in incidence, survival, and clinical pathways. We have provided a modular analytical framework that can be used to monitor the impact of the pandemic and generate evidence of clinical and policy relevance using multiple EHR sources. FUNDING: British Heart Foundation Data Science Centre, led by Health Data Research UK
    corecore